Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover's Distance Regularization

نویسندگان

  • Meng Zhang
  • Yang Liu
  • Huanbo Luan
  • Yiqun Liu
  • Maosong Sun
چکیده

pages 3188–3198, Osaka, Japan, December 11-17 2016. Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization Meng Zhang†‡ Yang Liu†‡ Huanbo Luan† Yiqun Liu† Maosong Sun†‡ †State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing, China ‡Jiangsu Collaborative Innovation Center for Language Competence, Jiangsu, China [email protected], [email protected] [email protected], {yiqunliu, sms}@tsinghua.edu.cn

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Earth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction

Cross-lingual natural language processing hinges on the premise that there exists invariance across languages. At the word level, researchers have identified such invariance in the word embedding semantic spaces of different languages. However, in order to connect the separate spaces, cross-lingual supervision encoded in parallel data is typically required. In this paper, we attempt to establis...

متن کامل

A Parallel Method for Earth Mover's Distance

We propose a new algorithm to approximate the Earth Mover’s distance (EMD). Our main idea is motivated by the theory of optimal transport, in which EMD can be reformulated as a familiar L1 type minimization. We use a regularization which gives us a unique solution for this L1 type problem. The new regularized minimization is very similar to problems which have been solved in the fields of compr...

متن کامل

Building Earth Mover's Distance on Bilingual Word Embeddings for Machine Translation

Following their monolingual counterparts, bilingual word embeddings are also on the rise. As a major application task, word translation has been relying on the nearest neighbor to connect embeddings cross-lingually. However, the nearest neighbor strategy suffers from its inherently local nature and fails to cope with variations in realistic bilingual word embeddings. Furthermore, it lacks a mec...

متن کامل

Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision

Building bilingual lexica from non-parallel data is a longstanding natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is h...

متن کامل

Creating bilingual lexica using reference wordlists for alignment of monolingual semantic vector spaces

This paper proposes a novel method for automatically acquiring multilingual lexica from non-parallel data and reports some initial experiments to prove the viability of the approach. Using established techniques for building mono-lingual vector spaces two independent semantic vector spaces are built from textual data. These vector spaces are related to each other using a small reference word li...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016